Delay-Optimal Power and Precoder Adaptation for 
Multi- stream MIMO Systems 

Vincent K. N. Lau and Yan Chen 



ON 
O 

o 



C/2 



> 

00 

0^ 
(N 
O 

\6 
o 
o\ 
o 



X 



Abstract — In this paper, we consider delay-optimal MIMO 
precoder and power allocation design for a MIMO Link in 
wireless fading channels. There are L data streams spatially 
multiplexed onto the MIMO link with heterogeneous packet 
arrivals and delay requirements. The transmitter is assumed to 
have knowledge of the channel state information (CSI) as well as 
the joint queue state information (QSI) of the L buffers. Using L- 
dimensional Markov Decision Process (MDP), we obtain optimal 
precoding and power allocation policies for general delay regime, 
which consists of an online solution and an offline solution. The 
online solution has negligible complexity but the offline solution 
has worst case complexity 0{{N + 1)^) where A'^ is the buffer 
size. Using static sorting of the L eigenchannels, we decompose 
the MDP into L independent 1-dimensional subproblems and 
obtained low complexity offline solution with linear complexity 
order 0{NL) and close-to-optimal performance. 



I. Introduction 

Multiple Input Multiple Output (MIMO) communication is 
well-known to boost the wireless spectral efficiency through 
spatial multiplexing. Substantial performance gain could be 
obtained by power and precoder adaptation according to the 
channel state information is available at the transmitter (CSIT). 
In [1], [2], a linear MIMO precoder design framework is 
proposed to minimize the weighted sum of mean square errors 
(MSE) assuming knowledge of perfect CSIT. In [3] and [4], 
MIMO precoder design utilizing either limited feedback or 
outdated CSIT is proposed. Yet, all these works assumed that 
the transmitter has infinite buffer and the information flow 
is delay insensitive, and focused on optimizing the physical 
layer performance (such as capacity, throughtput or MSE). In 
practice, it is very important to consider the delay performance 
in addition to the conventional physical layer performance in 
MIMO transceiver design. 

A combined framework taking into account of both queue- 
ing delay and physical layer performance is not trivial as 
it involves both the queueing theory (to model the queue 
dynamics) and information theory (to model the physical layer 
dynamics). In [5], it is shown that naive water-filling (which is 
optimal in information theoretical sense) is not always a good 
strategy with respect to the delay performance. In general, 
there are two approaches to deal with delay problems. The 
first approach converts the delay constraint into average rate 
constraint using tail probability at large delay regime and 
solve the optimization problem using information theoretical 
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formulation based on the rate constraint [6]-[8]. While this 
approach allows potentially simple solution, the control policy 
will be a function of CSIT only and such control will be 
good only for large delay regime. In general, the delay-optimal 
power and precoder adaptation will be a function of both the 
CSI and the queue state information (QSI). In the second 
approach, the problem of finding the optimal control policy 
(to minimize delay) is cast into a Markov Decision Problem 
(MDP) or stochastic control problem [9]. Unfortunately, it is 
well-known that there is no easy solution (e.g. value iteration 
and policy iteration) to MDP in general, even for the simple 
scenario like SISO channel [10], [11]. In [12], [13], the authors 
showed that the longest queue highest possible rate (LQHPR) 
policy is delay-optimal for symmetric multi-access fading 
channels. Works considering delay sensitive scheduling can be 
found in [14] and [15]. While all the above works addressed 
different aspects of the delay sensitive resource allocation 
problem, there are still some first order issues to be addressed. 

. Low complexity optimal control policy for delay 
sensitive resource allocation problem in general delay 
regime Most of the existing works considered large 
delay asymptotic solutions. However, practical operating 
region for delay sensitive traffics are usually on the low 
delay regime and the asymptotic simplifications cannot be 
applied. Hence, it is important to obtain low complexity 
control policy for general delay regime. 

• Coupling among multiple delay-sensitive heteroge- 
neous data streams Most of the above works considered 
single stream wireless link only [16]. While [12], [13] 
considered multi-user systems, the framework applies 
to situations with symmetric (homogeneous users) only 
and cannot be extended to situations with heterogeneous 
users. When we have heterogeneous data streams, the 
problem will be difficult as the optimal policy will 
generally be coupled with the joint queue state of all 
the heterogeneous streams. The general solution involves 
solving multi-dimensional MDP with exponential order 
of complexity w.rt. the number of streams. 

In this paper, we shall attempt to address the above issues 
for the delay-sensitive multi-stream MIMO power and pre- 
coder adaptation design. Specifically, we consider an Nt x Nr 
MIMO link with L < mm{Nt, Nr} spatially multiplexed het- 
erogeneous data streams (with different delay requirements). 
This represents an important scenario where a multi-antenna 
terminal receiving data from multiple application streams (with 
different delay requirements) simultaneously through a MIMO 
link from the base station. The transmitter is assumed to 
have knowledge of both the CSI and the QSI. Using L- 
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Fig. 1. Top level system model. 

dimensional MDP formulation on the embedded markov chain, 
we derive an optimal control policy to minimize the weighted 
average delays of the L application streams for general delay 
regime. The optimal policy consists of an online procedure 
and an offline procedure. The online procedure has negligible 
complexity but the offline procedure could be quite complex 
for large L. Using static eigenchannel mapping, we decompose 
the L-dimensional MDP problems into L one-dimensional 
MDP subproblems and obtain a low complexity solution 
with worst case complexity of 0{NL) but close-to-optimal 
performance. 

The paper is organized as follows. In Section |ll] we shall 
elaborate the system model, physical layer model as well as the 
queue model. In Section HUl we formulate the delay-sensitive 
precoder and power adaptation design as an MDP. In Section 
II VI we derive the low complexity optimal control policy. In 
Section [V] we elaborate the extensions when the CSIT is 
outdated. Section |VT] illustrates the delay performance of the 
proposed algorithm by simulations. Finally, we conclude with 
a brief summary of results in Section I VII I 

II. System Models 

In this section, we shall elaborate the system model, physi- 
cal layer model as well as the underlying queueing model. Fig. 
[U illustrates the top level system model where L application 
streams are spatially multiplexed and delivered to a multi- 
antenna terminal (with Nr antennas) from a multi-antenna 
source (with Nt antennas). These L application streams may 
have different source arrival rates and delay requirement^ 

A. MIMO Physical Layer Model 

We consider the use of MIMO linear transceivers, composed 
of a linear precoder at the transmitter (represented by a 
matrix P e C^'^^) and a linear equalizer at the receiver 
(represented by a matrix W e C^*^^^). The transmitted 
vector X G C^' is given by x = Ps where s e is 
the normalized data symbols from the L application streams 
with E (ss^) = I, and the total average transmitted power 
should satisfy E [||x|p] = Tr (PP^). Similarly, the estimated 
received symbols (corresponding to the equalizer outputs) is 
given by s = W^y where y G C^'^ is the channel outputs, 
i.e. y = Hx + z. Here H e C^-^^' is the MIMO channel 

' This corresponds to the scenario where the multi-antenna terminal may be 
running different applications simultaneously. 



State information (CSI) and z £ C is a zero-mean circularly 
symmetric complex Gaussian noise vector with normalized 
covariance I. Both the transmitter and the receiver are assumed 
to have perfect knowledge of the MIMO CSI tfl. 

As a result, the equivalent channel (with precoder, MIMO 
channel and the equalizer) for the L data stream is s = 
W^HPs + W^z and the SINR of the i-th data stream 
is SINR,(P) = |virfHp,|Vwf AjViT,, where A, = 
Hpjpj^H^+I and {p;} denotes the i-th column of the 
precoding matrix P. For sufficiently high SINR, the symbol 
error probability (SEP) of QAM constellation is [17]: 

for some constant ki. Hence, given a sufficiently small target 
SEP e, the data rate Ri (bits per symbol) of the i-th data 
stream is related to the SINRi{P) as Ri — log2(l + 
a{e)SINRi(P)), where a(e) is some constant depending on 
the target SEP e. Since the receiver has perfect CSIR and the 
data rate is an increasing function of SINRi, it is shown 
that for any precoder P, Wiener filter W = (HPP^H^ + 
I)~^HP can simultaneously maximize {SINRi, .., SINRl} 
[18]. As a result, the conditional average SINR of the i-th 
data stream after Wiener filtering is given by SINRiCP) — 
pfH^ A^^Hpi. Define the instantaneous MSB matrix as 

E(P) =E[(s-s)(s-s)^] = (I + P^H^HP)"\ (1) 

Note that the diagonal elements of E contains the instanta- 
neous MSEs of the L data streams. Using matrix inversion 
lemma [19], it can be shown that SINR,(P) = E,^^(P) - 1. 
Hence, the supported data rate at the target SEP e given by 
i?, = log2(l + «(e)(E,r'(P)-l)). 

B. Queue Model, System States and Control Policy 

In this paper, the time dimension is partitioned into schedul- 
ing slots (each slot has t channel uses) and we assume that the 
CSI H remains quasi-staticj^ within a scheduling slot and i.i.d. 
between scheduling slots. There are L buffers (each of length 
N) at the transmitter for the L application streams respectively. 
For simplicity, we assume the L application sources follow 
Poisson arrival with mean arrival rates (Ai,..,Al) (number 
of packets per channel use). The packet length of the ^-th 
data source, Ni, follows exponential distribution with mean 
packet size Ni (bits per packet). The transmitter is assumed 
to have knowledge of the QSI of the L buffers. Specifically, 
the QSI at time t is denoted by Q(t) = ((3i(t), . . . , e 
{0, . . . ,iV}^) where Qi{t) is the number of packets in the 
I — th buffer at time t. As a result, the observed system 
state at the transmitter, x — (H, Q), consists of both the 
CSIT and the joint QSI. Given an observed system state 
realization the transmitter may adjust the transmit power 

^We elaborate the case when the CSIT is outdated in section |v] 

'This assumption is realistic for pedestrian mobility users where the channel 

coherence time is around 50 ms but typical frame duration is less than 5ms 

in next generation wireless systems such as WiMAX. 
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and preceding matrix P according to a stationary precoding 
policj^ vr — {P(x)} defined below. 

Definition 1: {Stationary Precoding and Power Control 
Policy) A stationary transmit power and precoding policy 



TT : {0,..,iV}^x 



■^Nr X Nt 



is defined as the mapping 



from the currently observed system state x — (Qi H) to a lin- 
ear precoder n{x) = P(x0 The set of all feasible stationary 
policies is defined as T' = {tt : Tr(E [7r(x)7r^(x)|Q] ) > 
0,VQe {0,1,2, ..,iV}^}. 

Since the packet length is exponentially distributed with 
mean packet length Ni, the packet service time follows 
exponential distribution with conditional mean service rate 
(conditioned on system state x) [packets per channel use] 



(2) 



The overall delay dynamics of the L-stream multiplexed 
MIMO system can be modeled by L M/M/1 queues as 
illustrated in Fig. [T] The L queues are coupled together via 
the precoding policy P and the transmit power constraint. We 
shall derive an optimal stationary precoding policy to minimize 
the average delays of the L spatially multiplexed data streams 
subject to average transmit power constraint. Specifically, the 
average delay (in packets) of the i-th data stream is given by 

M 



TAn)^ lim sup — E 
M M 



E 



V^e{l,...,L} (3) 



where Qi^m = Qi {mr) is the QSI of the i-th buffer observed 
a.tt = rriT. The average transmit power constraint is given by: 



^'te(7r)=lim sup-i-E 

M M 



M 

E 



Tr(7r(xm)7r^(Xm)) 



<Po (4) 



where 7r(xm) denotes the precoder applied at t — mr. 
Note that the transmitter may adjust the precoding and power 
control actions only at the beginning of scheduling slots 
and the control action remains unchanged in between the 
scheduling slots. The average delay is related to the transmit 
power via the packet service rates The 
delay optimization problem can be formally written as: 

Problem 1: (Delay Optimal Policy) For some /3 — 
(/3i, /32, /9l) (such that /3i > for all i), we seek to find a 
stationary policy tt e P that minimizes 



(5) 



where xo denotes the initial system state. The positive weight- 
ing factors P indicate the relative importance of buffer delay 
among the L data streams and for each given /3, the solution to 
(|5]l corresponds to a point on the Pareto optimal delay tradeoff 
boundary. The constant 7 > is the Lagrange multiplier for 
the average transmit power constraint in (HJi. 

"^It is shown [9] that for finite state MDP, stationary and history independent 
policy is optimal. Hence, there is no loss of generality to consider policy that 
is function of current system state only. 

'Note that since a linear precoder P can be decomposed into UpSpVp 
where Up and Vp are unitary matrices (denoting the precoding actions) 
and Sp is a diagonal matrix (denoting the power allocation action), we 
shall represent both the precoding and power allocation actions by a single 
precoding matrix 7r(x)- 



III. Markov Decision Problem Formulation 

In this section, we shall formulate the delay minimization 
problem as Markov Decision Process and discuss the optimal- 
ity condition. We shall first introduce the embedded Markov 
chain and the induced reward random sequence. 

A. Embedded Markov Chain and MDP Formulation 

Recall that {Q(t)} is the continuous time random process 
(denoting the joint queue state of the L data streams) and 
{Qm} is the corresponding induced discrete time random 
process (denoting the joint queue states at observation epochs 
{0, r, 2t, ....}) with Q„i = Q(mr). The problem of finding 
the optimal control policy tt to minimize system delay in 
Problem [T] is in general quite tedious even for obtaining 
numerical solutions. To obtain simple solution, we consider 
the case where the scheduling slot duration (or frame duration) 
T is substantially smaller than the average packet interarrival 
time as well as average packet service time (r ^ i and t <C 
^j-j^fl Suppose the system state at the m— th observation 
epoch is Xm — {H.,„; Qm} . At the (m + 1)— th observation 
epoch t = (m+ l)r, one of the following events may happen: 
1) packet arrival from the i-th data source with probability 



('0 

9:9+1 



XiT; 2) Packet departure from the i-th data buffer 



with probability 



Mi(Qm)T = EH[Aii(Xm)|Qm]T; 3) 



No change in the i-th buffer state with probability pq 
1 — Pq*g_i — Pg^q+iQ Therefore, the embedded discrete time 
random variables {Qm} is an irreducible Markov chain 
induced by a stationary policy tt E V. In addition, given a 
stationary policy V, the Markov chain {Qm} depends on tt 
via the conditional average packet service rate 7Il(Qm) only. 
On the other hand, since the CSIT {Hm} is i.i.d. between any 
two observation epochs, the optimization objective function 
(average cost per stage) J^(xo) evaluated at the discrete time 
observation epochs can be expressed as: 

1 *^ 

J^ixo) - lim sup TT E [.9(Qm,^(Q™))](6) 



M 



where ^(Qm, 7r(Qm)) = ^ /3i(3i,m + 7Tr[7r(Qm))] (7) 

i=l 

7f(Qm) = Eh [7r(Xm)7r''(Xr„)|Qm] • (8) 

Given a stationary policy tt, the Markov chain {Qm} induces 
a random sequence of reward functions {.g(Qm, 7r(Qm))} 
depending on the chosen policy tt £ V. From (|7]), the evolution 
of the random sequence of reward function {(/(Qm, 7f(Qm))} 
depends on tt G via the conditional average transmit power 
cost Tr[7f(Qm)7f^(Qm)] Only. Hence, the delay-optimization 
problem in Problem [T] could be completely characterized by 
a multi-dimensional infinite horizon Markov Decision Process 
(MDP) with partial system state Q, per-stage reward function 
(7(Q,7f(Q)), and the conditional average precoding action 

^This is a mild assumption which could be justified in many applications. 
For example, in WiMAX, a frame duration is around 2ms while the target 
queueing delay for video streaming is around 200ms or more. 

^ Since r is small, the probability of multiple packet arrivals or departures 
among the L data sources is negligible and hence Pg'p = for |p — (?| > 1. 
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Fig. 2. State transition diagram for L— dimension Markov chain {Qm} with 
A'^ states each dimension. L = 2 for illustration. 



where d(A) denotes the diagonal elements of matrix A and 

g{di,..,dL,P) = 

L 

^ {V{qi, h - 1] + , -qL) - .., ^l)) 

X log2 (1 + a(e)K"i (E(P)) - 1)) + 7Tr[PP^] . (1 1) 

Note that d(E(P)) is related to the precoding matrix 7r(H, Q) 
according to ^ and (|2]l. Since the optimization variables 
in ( fTTT i involve the set of actions 7f(Q) for all CSIT re- 
alizations, the problem can be decomposed into solving 
minp t;(d(E(P)), P) for each CSIT and QSI reaUzation. To 
derive the optimal solution, we first have the following lemma. 



7r(Q). The state transition probability of the embedded MDP 
Pr[Qm+i|Qm,7F(Qm)] is illustrated in Fig.|2] 

B. Bellman Condition and Optimal Precoding Structure 

In general, the sequence of average costs 
{E[g(Q„j, 7f(Qm))]} of the infinite horizon MDP under 
a chosen stationary policy tt E V may not converge at all. 
Since the induced Markov chain {Qm} is irreducible for 
any stationary policy tt E V, the limit of long run average 
cost Jp{xo) converges and is independent of the initial state 
Xo- For the infinite horizon MDP described by Fig. |2] the 
optimizing policy can be obtained by solving the Bellman 
equation [9] recursively w.r.t. {9,{V{qi, ..,qL)}) as below: 

e+V{qi, ..,qL) = M{g{qi, qL,Tf{qi, ..,qL)) 
L 

AjV^(gi, .., (g, + 1)aa', ■■,qL) 

i=l 
L 

+-r^7^(gi, ..,qL)V{qi, .., [q.^ - 1] + , ..,qL) 



i=l 



+V{qi,..,qL) 1 



i=l 



L 

E 

1=1 



T^J.i{qi,..,qL) } (9) 



for all {qi,..,qL) e {0, l,..,iV}^ where X/\y = min{x,?/}. 
If there is a (e,{V{qi, ..,qL)}) satisfying then 6 = 
infTrep is the optimal average reward per stage. Further- 
more, since the induced Markov chain {Qm} is irreducible 
for any stationary policy tt E V, the solution to ^ is unique. 

Note that solution to (|9]l is still very complex due to the 
following. Firstly, it involves i-dimensional recursions and as 
a result, brute-force solutions have exponential order of com- 
plexity w.r.t. L. Secondly, each step of the recursion involves 
optimization w.rt. matrix precoder 7r(x). In the following, we 
shall utilize the underlying structure to deduce the optimal 
precoding structure for tt{x) first. 

Given any QSI Q and F(Q), let 7f(Q) = {P 7r(Q, H) e 
(^NtxL . Y jj ^ C^-^^'} be the set of all precoding 
actions per any possible CSIT realization (given a certain 
QSI realization Q). The optimization in the RHS of (|9]l is 
equivalent to the following form: 



min Eh 

7f(Q) 



(?(d(E(P)),P 



Lemma 1: If {0 , {V {qi , . . , qL)}) is a solution to the Bell- 
man equation (|9|, then V{qi, --TqL) is a monotonically non- 
decreasing function in all the L arguments. 

As a result of Lemma[T] g{di, ..,dL;P) is a Schur-concave 
function in and the optimal transmit precoder 

matrix is summarized in the following theorem. 

Theorem 1: {Optimal Precoding Matrix) For any realiza- 
tion of system state x(QSI Q, CSIT H), the optimal precoding 
action n{x) = P w.rt. ( fTOl ) is given by: 



US. 



(12) 



where U e (^NtxL ^ unitary matrix consisting of L eigen- 
vectors of H^H corresponding to the L largest eigenvalues 



and Sp = diag{y^ 



Jjl} is a diagonal matrix contain- 



(10) 



ing the power allocations over the L spatial channels. Note that 
the L largest eigenvalues {£,!,■■, are sorted in the same 
order as r]i = V{qi, ...,qL)- V{qi,. . . , - 1] + , . . . , ^l). 

All the proofs are omitted for lack of space and interested 
readers can refer to our full version in [20] for details. In 
general, the delay-optimal precoding and power allocation 
actions should be a function of both CSIT and QSI. From 
Theorem [H the optimal precoding matrix U seems to be a 
function of CSIT H only. However, this is not the case as 
the ordering of the L largest eigenvalues {^i, .., ^l} has to be 
sorted in the same order as {77^} where rji = V{qi, ..,qL) ~ 
V{qi, .., [qi — 1] + , ..qL) is a function of the QSI Q. Hence, the 
precoding matrix U is indeed a function of both the CSIT and 
QSI (implicitly) and it's because of this sorting requirement 
of the eigenvalues that makes the MDP analysis of the L data 
streams coupled together. 

Remark 1: Note that from Theorem [l] the delay-optimal 
precoder has the MIMO-channel diagonalizing structure and 
as a result, the subsequent delay-optimization and solutions 
can be applied to general L-parallel channels such as OFDM 
systems as well. 

C. Optimal Power Allocation Policy 

Using the precoder structure given by Theorem[Tl the condi- 
tional average MSE becomes d[E] = [(1 +piS,i)~^ , . . . , (1 + 
PL'Cl)"^]- Hence, the conditional average service rate becomes 

7Z-(Q) = ^log2(l + a(e)p,e^)- (13) 
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Therefore, without loss of generality, we shall consider opti- 
mization w.r.t. the power allocation policy ip defined as: 

Definition 2: (Power Allocation Policy) A power allocation 
poHcy : {0, .., N}^ x C^-''^' ^ is defined as the map- 
ping from the currently observed system state x = (Qi H) to a 
power allocation vector (p{x) — (Pij ■■^Pl) where fi{x) — Pi 
gives the power allocation to the i-th data stream. Furthermore, 
(p{qi,..,qL) = {{pi,..,pl) = -^(H, Q = {qi,..,qL)) : H G 
^N^xNtj denotes the set of power allocation actions for all 
CSIT reahzations at a given QSI Q = (gi, q^)- 

The Bellman equation in (|9]l can thus be written as: 

L L 
^ XiSViiqi, .., {qi + 1)an, qh) + ^ Piqt 

i=l 1=1 

-0(,5Fi((Zi, .., gi), . . . , SVdqu .., qL)^ e, (14) 

for qi = 0,..,N (and the initial condition can be set 

as V{Q,..,0) = 0). SV,{qi,..,qL)^T{V{qi,..,q,,..,qL) - 
V{qi, .., [qi - 1] + , ..,gL)), and (p^iji, ■■■,r]L) is defined as 



sup Eh 



■ L 

E 

.i=l 



log2(l + «(e)p,(H)CH)-7P.(H) 



The supremum is taken w.rt pi(H.), . . . ,pl(H.) and {^[i]} 
denotes the L largest eigenvalues of H^H sorted in the same 
order as {771, .., 77^}. Using standard optimization technique, 
the optimizing power allocation policy for 0(771, 77^) is 
given by the standard water-filling solution: 



J5*(H,77l, ..,77l) 



Nil 



(15) 



Hence, the Bellman equation in ( fT4] i can be solved using 
policy iteration [9] in an offline manner Once the solution of 
the Bellman equation in ( fT4] i is determined, the optimal power 
allocation (given a CSIT and QSI realization) is given by 
(p*(H,Q) = p*{B.,5Vi{q,),...,5VL{Cl)) as defined in (Ell- 
Using the optimal power allocation policy (^*, the embedded 
Markov chain {Q,„} is ergodic and time reversible and the 
steady state distribution il*^ = {lj*((7i, ...,(7^)} of the queue 
length process evolving under the optimal policy ip* can 
be obtained by solving the L-dimensional detailed balance 
equations and the average delay of the zth data stream is further 
given by T,{ip*) = Y.q„..^q,^ ftw*(gi, .., gi). 

As a final step, we shall determine the Lagrange multiplier 7 
by substituting (fTSl ) into dUi so as to satisfy the overall average 
transmit power constraint Po- 



-Pn = 



E E' 

qi,..,qL 1=1 



1 



IQ 



c^*(Q)6) 



D. Summary of the Optimal Solution 

In this section, we shall summarize the major results derived 
for delay-optimal performance. The optimal precoding and 
power allocation policy consists of an online procedure and 
an offline procedure. They are summarized below. 

Offline Procedure 



> Determination of Bellman Solution: For a given 7, de- 
termine ^?*(7), {T^*(<Zi, ■•, ^l; 7)} by solving the system 
of equations according to (fl4] i. 

• Transmit Power constraint: Determine 7 that satisfies 
the transmit power constraint in (fTSIlFI 

The outputs of the offline procedure are i{Pq), f?*(7(i-b)) ™d 
{5V*{qi^ .., qL)}, which shall be used in the online procedure. 
Online Procedure 

. Step 1) SVD of CSIT: Given the current CSIT H, obtain 
the largest L eigenvalues (^1 < ^2 < ••■ < Ci) of the 
matrix H^H and the corresponding eigenvectors. 

• Step 2) Optimal Precoder and Data Stream Index 
Assignment: The optimal precoder P = USp where 
Sp — diag{^/pi, .., ^/pZ} and U e i^NtxL contains 
the L eigenvectors obtained in Step 1 as columns. The 
ordering of the L eigenvalues (as well as the corre- 
sponding eigenvectors) are sorted in the same order as 
SV{{Cl), (5V2(Q) for the given QSI cfl 

• Step 3) Optimal Power Allocation: Based on the 
precoder and data stream index association in step 
2, the power allocation is given by </?*(H, Q) = 
P*(H,,51/i(Q),...,,5Vl(Q)) as defined in O. 

IV. Low Complexity Solution 

While the solution derived in the previous section is optimal 
and the solution to the Bellman equation ( fT4b can be carried 
out in an offline manner, the complexity involved is huge as 
it involves solving for exponentially large (w.r.t. L) number 
of variables (worst case complexity of 0{{N + 1)^)). In this 
section, we propose a low complexity suboptimal solution as 
an alternative, which has a worst case complexity of 0{NL) 
in the offline procedure but close-to-optimal performance. 



A. Decomposition of the MDP 

Using the optimal unitary precoding solution U in Theo- 
rem [U the Bellman Equation is coupled among the L data 
streams due to the sorting requirement of the eigenvalues 
according to (5Vi(Q), ■■, <5Vl(Q). In order to obtain simple 
solution, we consider a static sorting arrangement for the 
L largest eigenvalues ^i,..,^^. Specifically, we shall sort 
the L eigenvalues in the same ordering as (3i,..,Pl (which 
represents the relative importance of the L data stream). 
While this is suboptimal in strict sense, the proposed static 
sorting scheme will not cause too much performance loss 
especially for highly asymmetric cases (/3i ^ P2 ^ ■■■Pl) 
or highly symmetric case /3i « /32 ~ ... ~ Using static 
sorting scheme and given a stationary power control policy. 
If — {ifii, ..,ipl), the MDP state transition probability as 
depicted in Fig.|2]is decomposable among the L data streams. 

*For simplicity, in our simulation, we avoid using root-finding algorithms 
to calculate 7, but calculate the corresponding Pq for each given 7 by )16t . 

'For example, let L = 3. Given the current QSI Q = (gi, 92, 93), assume 
<5Vi*(cji,g2,g3) = 2.0, (SVj* ((71,92,93) = 3.0,5V.^{qi,q2,q3) = 1-5. 
Then the largest eigenvalue ^[1] should be associated with the 2nd data stream. 
The next largest eigenvalue ^[2] should be associated with the 1st data stream 
and the smallest eigenvalue ^[3] should be associated with the 3rd data stream. 
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The average cost per stage in (|6]l under Lp — [tpi, ..yCpL) can 
be decomposed as = J^iLi where 

1 

Jt^ = J™^ 51 9^{Q^,nl,,^^{Q^,■n^)), (17) 
m— 1 

giiQt,m,'iPiiQi,m)) = /3iQi,m +7^i('9j,m), (18) 
'^i{Qi,m) = E [<^i(Xm)|Qi,m] (19) 

Hence, the original "minimal average cost per stage" problem 
mi^ Jg can be decomposed into L individual sub- 



T* 
•J n 



problems ^ = ™^(pi J'p \ for i = 1, .., L. Consider the i-th 
subproblem, the Bellman equation is given by: 

Or+V^iq) = inf {g,iq,^M)+rX^V^{{q+l)/^N) 

+T]ri{qm[q - 1] + ) + V^iq){l - rA, - miq)}{20) 

for all q G {0, 1, .., N}, in which /i7('?) is given by 

Wi{q) = =IE [log2(l + a{e)(p^{x^)£.^)\Q^,„^ = q] , (21) 

where is the i-th eigenvalue of H^H (sorted in the same 
order as {(3i, .., (3l}) and Lpi{q) {p^ (/3i(H,(5j = q) : 
H e (^NrxNty denotes the set of power allocation actions 
for all CSIT realizations at a given QSI Qi = q. Since 
the embedded Markov chain {Qi^m} is irreducible, there is 
a unique solution {9i,Vi{0), ...,Vi{N)) satisfying ( |20] | and 
6i = Jp i- We shall derive a low complexity optimal solution 
for the Bellman equation (|20] i in the next subsection. 

B. Solution to the decoupled Bellman Equation 

Without loss of generality, we shall consider the i-th MDP 
problem. Let 6V,{q) ^ T{V^{q)-V^{q-l)) for q = \,2, ..,N . 
The Bellman equation in ( |20] l can be expressed recursively in 
terms of {5Vi{q)} as follows: 



\^5V^{q +l) = 0, + dp,{5V^{q)) - 



(22) 



for q = 0,1,...,7V — 1 with two boundary conditions that 

5V^{{)) = and (3^N = 0,((5F,(7V)) + 0^, where 



(j)i{y) = sup Eh 
{p(H)} 



y_ 



log, (1 + «(e)p(H)e.) - 7P(H) 



To solve the Bellman equation in (l22l l. we can first choose 
a testing value 6 and for each stream and obtain a se- 
quence {SV^{1,9),...,SV^{N,9)} inductively from ^ for 
q = 0,l,...,iV- 1. Define M9) = [M5V,{N,9)) + 9]/ (3„ 
the tuple {9,SVi{l,9), ...,SVt{N,9)) is a solution to the 
Bellman equation in ( |22] | if and only if /i(6') = A^. Since 
fi{9) is continuous, strictly increasing in 9, there exists a 
unique 6** = ff^{N) so that /i(6'*) = N. CoiTespondingly, 
{9*,dVi{l,9*),...,SVi{N,9*)) is the unique solution satisfy- 
ing the Bellman equation in ( l22b and 9* can be obtained 
easily by one-dimensional bisection method. Furthermore, 
using standard optimization techniques, the optimal power 
allocation policy (for a given QSI Qi = q) is given by 



for q = 1, 2, N and p*(H, 0) = 0. 

Remark 2: In equation ( |23] |. the power allocation solution 
depends on the QSI only via the equivalent water-level 7i ^ = 
SVi{q,9*)/jNi. For larger queue size, the equivalent water- 
level Ti^^is increased. This result is also consistent with the 
asymptotic delay-optimal solution for point-to-point single- 
stream system in [10]. 

Using the optimal power allocation policy (p*{q) for q = 
0, 1, 2, iV, the embedded Markov chain {Qi^m} of the i- 
th data stream is ergodic and time reversible. The steady 
state distribution ft{ip*) = {ujQ{(p*),LUi{ip*), ..,ujNi(p*)) of 
the queue lengths under the optimal policy (p* can be obtained 
by solving the L one-dimensional detailed balance equations 
for all g = 0, 1, .., iV — 1 combined with Y^^=q ^qi'^l) — 1- 

As a final step for the power allocation policy, we have to 
determine the common Lagrange multiplier 7 among the L 
data streams to satisfy the overall average power constraint 



N 



1 



(24) 



C. Summary of the Low Complexity Solution 

The low complexity precoding and power allocation policy 
also consists of an online procedure and an offline procedure, 
which are summarized below. 

Offline Procedure 

• Step 1) Determination of Bellman Solutions: For i = 

l,..,L and a 7, determine {6*1(7), ■•, (7)} as well as 
{SV,{q,9lij)),...,SVL{q,9l{j))} according to (EH). 

• Step 2) Transmit Power Constraint: Solve for 7 that 
satisfies the transmit power constraint in (l24l i using one 
dimensional root-finding numerical algorithm. 

The offline complexity is only of 0{NL). The outputs of 
the offline procedure include j{Po), 9l{-/{Po)), 6*2 (7(^0)) 
as well as {,5Fi(g, 0^ (7(^0))), 5^L(g, f?2(7(Po)))}. These 
shall provide inputs to the online procedure. 

Online Procedure 

. Step 1) SVD on CSIT: Given the current CSIT H, obtain 
the largest L eigenvalues (^1 < ^2 < ••• < £,l) of the 
matrix H^H and the corresponding eigenvectors. 

. Step 2) Precoder and Data Stream Mapping: 

The optimal precoder P = USp where Sp = 
diag{y^, .., and U e £NtxL ^Qjitains the L 

eigenvectors obtained in Step 1 as columns. The L largest 
eigenvalues are sorted in the same order as {(3i, (Sl}- 

• Step 3) Optimal Power Allocation: Based on the 
precoder and data stream index association in step 2, 
the power allocation of the i-th data stream is given by 
^*(H, Q) = p*(H, g,) according to (US. 

V. Extensions to Outdated CSIT 

When the CSIT is outdated, there will be spatial interference 
between the spatial streams of the MIMO channels, which 
further complicates the precoder design. We shall first define 
the MIMO physical layer model with CSIT error and extend 
our delay-optimal formulation and results thereafter. 
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A. MIMO Physical Layer Model with CSIT Error 

Consider the case where the CSIT error is due to the 
estimation noise on the reverse link pilot in a TDD system, the 
MMSE estimator of the CSIT H at the transmitter is given by 
H = H + AH [21], where AH - C7V(0, <J^lj3 Moreover, 
E[AH^H] due to the orthogonality principle of MMSE. 
Hence, is a parameter which represents the CSIT qualit\F'l 

Following similar methods in Section |ll] we shall extend 
the MIMO physical layer model to accommodate the ef- 
fect of the outdated CSIT. Specifically, the conditional av- 
erage SINR of the i-th stream is given by SINRi{P) = 



E 



|wfHp,|Vwf A,w, 



H 



Hence, the conditional SER 



(conditioned on the CSIT H) of QAM constellation and 
the associated data rate of the i-th stream i?j are given 

by F,(H) < (J^m) < ' ~ 



^exp( 



and 



-1) 



R^ = log2(l + a(e)S'/iVi?j(P)), respectively. Combining the 
definition in ([U and the matrix inversion lemma [19], we may 
express the conditional average SINR of the i-th stream as 



^/7Vi?,(P)-E E,::^(P)-1|H >E E,,(P)|H 



1 



where the last step results from Jensen's inequality. Hence, 
we have a lower bound for the average supported data rate 
(conditioned on H) at the target SER e given by Ri > 
log2 (1 + a(e)(E~"'(P) - 1)), where E~ = E [e,,|H 



B. Extension of the Formulation and Results 

The delay optimization problem formulation in (|5]l can be 
easily extended for outdated CSIT by modifying the system 
state variable x = (Hj Q)- Theorem [T] can be extended as 

Corollary 1: For any realization of system state x(H, Q), 
the optimal precoding action tt{x) = P w.rt. ( fTOb is given by: 



7r(x) = P = US, 



(25) 



where U e 



^NtxL 



is a unitary matrix consisting of L 



eigenvectors of H^H + Nrl corresponding to the L largest 
eigenvalues and Sp = diag{^/pi, . . . , ^J^^ is a diagonal 
matrix containing the power allocations over the L spatial 
channels. Note that the L largest eigenvalues {fi,..,^L} 
are sorted in the same order as r;^ ~ y(qi, . . . , q^) — 
V{q\, . . . , - 1] + , . . . 

Using the precoder structure given by Corollary [H the con- 
ditional average MSE becomes d[E] = [(l+PiCi)""^, ■ ■ ■ 7 (1 + 
Vl^l)~^\ [19]: and hence, the conditional average service 



a(e)piCi) 



As 



rate /ij becomes /ii(Q) Ejj 
a result, all the subsequent formulation and solutions can be 
applied by replacing H with H as the estimated CSIT. 

'"For detailed error model, please refer to our full version [20] 
"We assume that the receiver has perfect knowledge of CSIR for detection 
and decoding. This is because that in practice, a relatively strong forward link 
pilot channel is available from the base station to the receivers, so that the 
CSIR estimation error is insignificant relative to that of the CSIT. 
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- - stream 2. low complexity 
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Fig. 3. Comparison of the average delay under optimal and low complexity 
solutions under perfect CSIT. Nt = Nr = 2, /3i = 1,^2 = 10. 
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Fig. 4. Average delay of the proposed low-complexity solution for different 
{Nt, Nr) configurations under perfect CSIT. /3i = 1, /32 = 10. 



VI. Numerical Results and Discussions 

In this section, we evaluate the proposed solutions to the 
delay sensitive precoder and power adaptation design via 
numerical simulations. Two data streams are considered with 
weights (3i, (32 in (|5]), respectively. The mean packet size 
and mean arrival rate for the two streams are the same, i.e. 
1V7 = iV2 = 200 bits per packet and Ai = A2 = 0.02 packets 
per channel use time t. The buffer size is = 4 for each 
strearrOB The scheduling time unit r and the target SER e are 
fixed at 5ms and 1%, respectively. 

Fig. |3] compares the average delay of the two data streams 
under the optimal and low complexity solutions for a 2-by-2 
MIMO system. As we can see from the figure, both of our 
proposed solutions show full support of heterogeneous delay- 
sensitive services. Furthermore, the low complexity solution 
has close-to-optimal performance with a worst case complexity 
of only 0{NL), which indicates its practical significance. 

Fig. |4] depicts the average delay of the two streams of 

'^This implies that the delay for a packet is at most four packets. Since we 
are considering delay-sensitive applications, this can be a valid assumption. 
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Fig. 5. Sum average delay of the proposed low complexity scheme and two 
baseline schemes, given different values of CSIT error variance a^, fi\ = 

P2 = l,Nt=Nr= 2. 



the proposed low-complexity solution under different con- 
figurations of transmit and receive antennas. In Fig. |5] we 
set /3i = /32 = 1 and compare the sum average delay of 
the proposed scheme for a 2-by-2 MIMO system with two 
basedlines: 1) the Round-Robin scheme, i.e. the two streams 
are serviced in TDMA fashion with equally allocated time 
slots; 2) the CSIT only scheme, i.e. the precoder and power 
adaptation for the two streams are designed purely based on 
the outdated CSIT. Above lOdB gain can be achieved by 
the proposed scheme over the two baselines. The figure also 
suggests that spatial multiplexing may not help effectively 
without adapting to both the CSI and the QSI, and the CSIT 
only scheme is much more sensitive than the proposed scheme 
w.r.t. the CSIT quality. This illustrates the robustness of our 
proposed scheme to CSIT errors. 
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VII. Summary 

We considered delay sensitive MI]VIO system with L het- 
erogeneous data streams spatially multiplexed together. The 
design of precoding policy achieving Pareto optimal delay 
tradeoff is fomulated into an L-dimensional MDP problem. A 
low complexity solution with worst case complexity 0{NL) 
is proposed by decomposing the original problem into L one- 
dimensional subproblems based on static sorting. Numerical 
results verify the advantages of taking both QSI and CSIT 
error into dynamic precoder design. 
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